Goto

Collaborating Authors

 gravitational pull


Escaping the Gravitational Pull of Softmax

Neural Information Processing Systems

The softmax is the standard transformation used in machine learning to map real-valued vectors to categorical distributions. Unfortunately, this transform poses serious drawbacks for gradient descent (ascent) optimization. We reveal this difficulty by establishing two negative results: (1) optimizing any expectation with respect to the softmax must exhibit sensitivity to parameter initialization ( softmax damping''). Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities. To circumvent these shortcomings we investigate an alternative transformation, the \emph{escort} mapping, that demonstrates better optimization properties. The disadvantages of the softmax and the effectiveness of the escort transformation are further explained using the concept of N\L{} coefficient. In addition to proving bounds on convergence rates to firmly establish these results, we also provide experimental evidence for the superiority of the escort transformation.


Brightest supermoon of 2025 lights up the sky this week

Popular Science

This month's full moon will come within about 222,000 miles of Earth. The supermoon rises from the sea in Molfetta, Italy, on October 7, 2025. It was the first of three consecutive supermoons in 2025. Breakthroughs, discoveries, and DIY tips sent every weekday. As the year's penultimate month kicks off, the year's brightest supermoon is almost here.


The universe may die sooner than expected

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. Nothing is permanent--not even the universe itself. At least, that's what current models of physics tell us about the nature of existence. Luckily for humanity, most astrophysicists' estimates don't have the universe's grand finale scheduled for around 10¹¹⁰⁰ years (that's a 1 followed by 1,100 zeros). However, based on new calculations that include the peculiar nature of certain black hole particles, the universe's curtains may fall much sooner than expected--cosmically speaking.


Review for NeurIPS paper: Escaping the Gravitational Pull of Softmax

Neural Information Processing Systems

Summary and Contributions: ##Update## The rebuttal adequately addressed my main concerns and I am consequently increasing my score to a 7. In particular I was pleased that the authors investigated the issues with the learning rate, and I would be happy if they mention this potential limitation in their revisions, and include the experimental results showing that the naive adaptive learning rate proposals I made would not be effective. It was also pleasing that they will discuss and compare with Neural Replicator Dynamics, and the additional experiment with sampled actions also looks promising. The reason I didn't increase my score further was that the current set of experiments is still rather simple, and it is difficult for me to assess whether the new method is likely to be widely used. Though, I feel that the contribution may well turn out to be much more influential.


Review for NeurIPS paper: Escaping the Gravitational Pull of Softmax

Neural Information Processing Systems

This paper is proposing alternative to common practices in machine learning: Softmax Policy Gradient for RL and softmax parameterization in classification when minimizing cross-entropy loss. The limitation of softmax in these two cases are well explained, and the paper will be interesting for a wide range of the NeurIPS community.


Escaping the Gravitational Pull of Softmax

Neural Information Processing Systems

The softmax is the standard transformation used in machine learning to map real-valued vectors to categorical distributions. Unfortunately, this transform poses serious drawbacks for gradient descent (ascent) optimization. We reveal this difficulty by establishing two negative results: (1) optimizing any expectation with respect to the softmax must exhibit sensitivity to parameter initialization (softmax gravity well''), and (2) optimizing log-probabilities under the softmax must exhibit slow convergence (softmax damping''). Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities. To circumvent these shortcomings we investigate an alternative transformation, the \emph{escort} mapping, that demonstrates better optimization properties.


Brightest and hungriest black hole ever detected: Terrifying void gobbles up one Sun every single day, scientists say

Daily Mail - Science & tech

Astronomers have found the brightest object in the universe – a'hellish' black hole that consumes a star a day. Described as'the most hellish place in the universe', the black hole is 12 billion light years away and has a mass roughly 17 billion times that of our solar system's sun. Due to their immense gravitational pull, black holes grow in mass by capturing nearby material, whether it's stars, planets and even other black holes. The matter being pulled in toward this black hole, known as J0529-4351, forms a whopping disc that measures seven light-years in diameter. All galaxies have a supermassive black hole at their cores.


The 20 most puzzling questions in modern life revealed - so do YOU know the answers?

Daily Mail - Science & tech

What is an NFT? (34%) Non-fungible tokens (NFTs) are generally digital art pieces or music that can be bought or traded online. These are unique computer files encrypted with an artist's signature. As a result, they cannot be replicated, acting as a digital certificate of ownership and authenticity. In other words, buying an NFT is almost like the more traditional purchasing of fine art - except in a digital form. Artists can sell pieces that may be tricky to advertise otherwise, such as digital stickers.


'Ghost black hole' from a previous universe is 'found' by astrophysics

Daily Mail - Science & tech

The eccentric view has come from Oxford University mathematical physicist Roger Penrose, State University of New York Maritime College mathematician Daniel An and University of Warsaw theoretical physicist Krzysztof Meissner. These leading thinkers are now calling for a modified version of the Big Bang to account for this multiverse theory. The theory is called conformal cyclic cosmology, or CCC, and states that universes develop, expand and die in sequence. The black holes in each one then leaves its mark on the following universe that follow. Recently published data has then argued that these are detectable in existing data from the CMB.